Coarse-to-Fine, Cost-Sensitive Classification of E-Mail
نویسندگان
چکیده
In many real-world scenarios, it is necessary to make judgments at differing levels of granularity due to computational constraints. Particularly when there are a large number of classifications that must be done in a real-time streaming setting and there is a significant difference in the time required to acquire different subsets of features, it is important to have an intelligent strategy for optimizing classification accuracy versus computational costs. Accurate and timely email classification requires trading off the classification granularity with the feature acquisition costs. To solve this problem, we introduce a Granular Cost-Sensitive Classifier (GCSC) which modulates the cost of feature acquisition with the granularity of the classification, allowing inexpensive classification at a coarse level and more costly classification at finer levels of granularity. Our approach can classify messages with greater accuracy while incurring a lower feature acquisition cost relative to baseline classifiers that do not make use of cost information.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملA New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملThe Effect of Percentage of Remaining Hair Bred and Ambient Relative Humidity on Electrical Resistance of Cashmere Fiber
Among different types of controlling systems, the ON/OFF digital relative humidity control was used for measuring electrical properties of cashmere fibers to make the ambient relative humidity fixed. To achieve this goal the required hardware and software were designed and fabricated. The electrical resistance of fine and coarse hair cashmere fiber was measured by charge and discharge condenser...
متن کاملVegetation community in relation to the soil characteristics of Rineh rangeland, Iran
The aim of this study was to investigate relationships between soil properties and plant species to determine the most effective factors separating vegetation communities in Rineh rangeland. Three stratifying variables were selected including slop, aspect and elevation. The study area was partitioned by combining these classes to generate homogenous units. 1m2 quadrates were located at sampling...
متن کاملA Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کامل